Word Sense Induction for Machine Translation
نویسنده
چکیده
We have witnessed the research progress of machine translation from phrase/syntax-based to semanticsbased and from single sentence-based to discourse and document-based. This talk presents our work of word sense-based translation model for statistical machine translation, which is one of semantics-based SMT research at word sense level. The sense in which a word is used determines the translation of the word. The talk begins with how to build a broad-coverage sense tagger based on a nonparametric Bayesian topic model that automatically learns sense clusters for words in the source language, and then focuses on the proposed word sense-based translation model that enables the decoder to select appropriate translations for source words according to the inferred senses for these words using maximum entropy classifiers. The talk ends with experiential results and some conclusions. To the best of our knowledge, this is the first attempt to empirically verify the positive impact of lexical semantics (word sense) on translation quality. This is a joint work with Deyi Xiong, Soochow University.
منابع مشابه
Word Sense Induction for Better Lexical Choice
Most words in natural languages are polysemous in nature that is they have multiple possible meanings or senses. The sense in which the word is used determines the translation of the word. We show that incorporating a sense-based translation model into statistical machine translation model consistently improves translation quality across all different test sets of five different language-pairs,...
متن کاملSession 6: Lexicon And Lexical Semantics
While other word-level marking tasks such as morphology and part-of-speech tagging have arrived recently at a well-developed methodology and a basis for comparing results across systems, the robust discrimination of word senses in text is a less mature discipline. Yet, word sense discrimination is central to many natural language processing tasks, such as data extraction and machine translation.
متن کاملWord Sense Discovery Based on Sense Descriptor Dissimilarity
In machine translation, information on word ambiguities is usually provided by the lexicographers who construct the lexicon. In this paper we propose an automatic method for word sense induction, i.e. for the discovery of a set of sense descriptors to a given ambiguous word. The approach is based on the statistics of the distributional similarity between the words in a corpus. Our algorithm wor...
متن کاملImproving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings
Word sense disambiguation is necessary in translation because different word senses often have different translations. Neural machine translation models learn different senses of words as part of an end-to-end translation task, and their capability to perform word sense disambiguation has so far not been quantified. We exploit the fact that neural translation models can score arbitrary translat...
متن کاملWord Sense Disambiguation vs. Statistical Machine Translation
We directly investigate a subject of much recent debate: do word sense disambigation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-ofthe-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014